A Survey on Fault Tolerance Mechanisms for job scheduling in Grid computing
نویسندگان
چکیده
Grid computing is defined as a hardware and software infrastructure that enables sharing of coordinated resources in a dynamic environment. In grid computing, the probability of a failure is much greater than parallel computing. Therefore, the fault tolerance is an important issue in order to achieve reliability, availability of resources. When scheduling a job, the resource uses both average failure time and failure rate of grid resources combined with resources response time to generate scheduling. There are several reasons for failure in execution such as network failure, resource overloading, or non-availability of required software components. Thus, fault-tolerant systems should be able to identify and rectify the failures and support reliable execution in the presence of failures.In this paper, a survey made on various fault tolerance techniques and mechanism and job management in grid computing.
منابع مشابه
Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid
Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کاملFault Tolerance Testing for Crash and Omission Transient Failure during Resource Scheduling of Grid Computing
In computational Grid, fault tolerance is an imperative issue to be considered during job scheduling. Due to the widespread use of resources, systems are highly prone to errors and failures. Hence fault tolerance plays a key role in grid to avoid the problem of unreliability. The two main techniques for implementing fault tolerance in grid environment are check pointing and replication. Grid Co...
متن کاملAnalysis of Fault Tolerance on Grid Computing in Real Time Approach
In computational Grid, fault tolerance is an imperative issue to be considered during job scheduling. Due to the widespread use of resources, systems are highly prone to errors and failures. Hence fault tolerance plays a key role in grid to avoid the problem of unreliability. Scheduling the task to the appropriate resource is a vital requirement in computational Grid. The fittest resource sched...
متن کاملImproving Fault Tolerant Resource Optimized Aware Job Scheduling for Grid Computing
Workflow brokers of existing Grid Scheduling Systems are lack of cooperation mechanism which causes inefficient schedules of application distributed resources and it also worsens the utilization of various resources including network bandwidth and computational cycles. Furthermore considering the literature, all of these existing brokering systems primarily evolved around models of centralized ...
متن کامل